Search CORE

16 research outputs found

Execution strategies for SQL subqueries

Author: César A. Galindo-legaria
Milind M. Joshi
Mostafa Elhemali
Torsten Grabs
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2007
Field of study

Optimizing SQL subqueries has been an active area in database research and the database industry throughout the last decades. Pre-vious work has already identified some approaches to efficiently execute relational subqueries. For satisfactory performance, proper choice of subquery execution strategies becomes even more essen-tial today with the increase in decision support systems and auto-matically generated SQL, e.g., with ad-hoc reporting tools. This goes hand in hand with increasing query complexity and growing data volumes – which all pose challenges for an industrial-strength query optimizer. This current paper explores the basic building blocks that Microsoft SQL Server utilizes to optimize and execute relational subqueries. We start with indispensable prerequisites such as detection and removal of correlations for subqueries. We identify a full spectrum of fundamental subquery execution strategies such as forward and reverse lookup as well as set-based approaches, explain the different execution strategies for subqueries implemented in SQL Server, and relate them to the current state of the art. To the best of our knowl-edge, several strategies discussed in this paper have not been pub-lished before. An experimental evaluation complements the paper. It quantifies the performance characteristics of the different approaches and shows that indeed alternative execution strategies are needed in different circumstances, which make a cost-based query optimizer indispen-sable for adequate query performance

CiteSeerX

Crossref

ABSTRACT PowerDB-IR: Information Retrieval on Top of a Database Cluster

Author: Torsten Grabs
Publication venue
Publication date
Field of study

Our current concern is a scalable infrastructure for information retrieval (IR) with up-to-date retrieval results in the presence of frequent, continuous updates. Timely processing of updates is important with novel application domains, e.g., e-commerce. We want to use off-the-self hardware and software as much as possible. These issues are challenging, given the additional requirement that the resulting system must scale well. We have built PowerDB-IR, a system that has the characteristics sought. This paper describes its design, implementation, and evaluation. PowerDB-IR is a coordination layer for a database cluster. The rationale behind a database cluster is to ’scale-out’, i.e., to add further cluster nodes, whenever necessary for better performance. We build on IR-to-database mappings and service decomposition to support high-level parallelism. We follow a three-tier architecture with the database cluster as the bottom layer for storage management. The middle tier provides IR-specific processing and update services. PowerDB-IR has the following features: It allows to insert and retrieve documents concurrently, and it ensures freshness with almost no overhead. Alternative physical data organization schemes provide adequate performance for different workloads. Query processing techniques for the different data organizations efficiently integrate the ranked retrieval results from the cluster nodes. We have run extensive experiments with our prototype using commercial database systems and middleware software products. The main result is that PowerDB-IR shows surprisingly ideal scalability and low response times. 1

CiteSeerX

ABSTRACT XMLTM: Efficient Transaction Management for XML Documents

Author: Klemens Böhm
Torsten Grabs
Publication venue
Publication date
Field of study

A common approach to storage and retrieval of XML documents is to store them in a database, together with materialized views on their content. The advantage over ”native” XML storage managers seems to be that transactions and concurrency are for free, next to other benefits. But a closer look and preliminary experiments reveal that this results in poor performance of concurrent queries and updates. The reason is that database lock contention hinders parallelism unnecessarily. We therefore investigate concurrency control at the semantic, i.e., XML level and describe a respective transaction manager XMLTM. It features a new locking protocol DGLOCK. It generalizes the protocol for locking on directed acyclic graphs by adding simple predicate locking on the content of elements, e.g., on their text. Instead of using the original XML documents, we propose to take advantage of an abstraction of the XML document collection known as DataGuides. XMLTM allows to run XML processing at the underlying database at low ANSI isolation degrees and to release database locks early without sacrificing correctness in this setting. We have built a complete prototype system that is implemented on top of the XML Extender for IBM DB2. Our evaluation shows that our approach consistently yields performance improvements by an order of magnitude. We stress that our approach can also be implemented within a native XML storage manager, and we expect even better performance

CiteSeerX

ETH Zürich at INEX: Flexible Information Retrieval from XML with PowerDB-XML

Author: Hans-Jörg Schek
Torsten Grabs
Publication venue: ERCIM Publications
Publication date
Field of study

When searching for relevant information in XML documents, users want to exploit the document structure when posing their queries. Therefore, queries over XML documents dynamically restrict the context of interest to arbitrary combinations of XML element types. State-of-the-art information retrieval (IR) however derives statistics such as document frequencies for the collection as a whole. With contexts of interest defined dynamically by user queries, this may lead to inconsistent rankings with XML documents that have heterogeneous content from different domains. To guarantee consistent retrieval, our XML engine PowerDB-XML derives the appropriate IR statistics that consistently reflect the scope of interest defined by the user query on-the-fly, i.e., at query runtime. To compute the dynamic IR statistics efficiently, our implementation relies on underlying basic indexes and statistics data. This paper reports on our experiences from participating in INEX, the INitiative for the Evaluation of XML retrieval

CiteSeerX

A parallel document engine built on top of a cluster of databases: Design, implementation, and experiences

Author: Böhm Klemens
Grabs Torsten
Schek Hans-Jörg
Publication venue: Swiss Federal Institute of Technology, Department of Computer Science, Institute of Information Systems, Database Research Group
Publication date: 01/01/2000
Field of study

Repository for Publications and Research Data

XMLTM: High-Performance XML Extensions for Commercial Database Systems

Author: Hans-Jörg Schek
Klemens Böhm
Torsten Grabs
Publication venue
Publication date
Field of study

Due to the success of XML for data interchange, relational database products now include support for processing of XML data. A common approach, subsequently referred to as XML extensions, stores XML documents in characterlarge-object (CLOB) attributes and extends the database engine with XML-specific functionality to read and update these attributes. In addition, XML extensions materialize views on XML content in so-called side tables. Triggers guarantee consistency of documents and view materializations. However, a series of preliminary experiments with XML extensions has revealed that performance of concurrent queries and updates is low. This motivates to design XML extensions more carefully, aiming at better performance. This study identifies two important shortcomings of XML extensions: (1) Database lock contention hinders parallelism unnecessarily. (2) Querying and updating of an XML document requires to load the entire document into an internal representation. We propose a solution that addresses these problems as follows: Its core is a transaction manager XMLTM. It features a new locking protocol DGLOCK. DGLOCK generalizes the protocol for locking on directed acyclic graphs for XML data. XMLTM allows to run XML processing at low ANSI isolation degrees and to release database locks early without sacrificing correctness. Regarding the second problem, we make use of the idea of decomposing XML documents into fragments. The rationale is that the internal representation is generated only for the relevant fragments. We have implemented our solution on top of the XML Extender for IBM DB2. Our experimental evaluation shows that our approach consistently yields performance improvements by an order of magnitude

CiteSeerX

Fine-Grained Lazy Replication with Strict Freshness and Correctness Guarantees

Author: Akal Fuat
Breitbart Yuri
Grabs Torsten
Schek Hans-Jörg
Türker Can
Publication venue: ETH, Department of Computer Science
Publication date: 01/01/2004
Field of study

Eager replication management is known to generate unacceptable performance as soon as the update rate or the number of replicas increases. Lazy replication protocols tackle this problem by decoupling transaction execution from the propagation of new values to replica sites while guaranteeing a correct and more efficient transaction processing and replica maintenance. However, they impose several restrictions on transaction models that are often not valid in practical database settings, e.g., they require that each transaction executes at its initiation site and/or are restricted to full replication schemes. Also, the protocols cannot guarantee that the transactions will always see the freshest available replicas. This paper presents a new lazy replication protocol called PDBREP that is free of these restrictions while ensuring one copy serializable executions. The protocol exploits the distinction between read-only and general transactions and works with arbitrary physical data organizations such as partitioning and striping as well as different replica granularities. It does not require that each read-only transaction executes entirely at its initiation site. Hence, each read-only site need not contain a fully replicated database. PDBREP furthermore generalizes the notion of freshness to finer data granules than entire databases. Beside its architectural advantages, experiments revealed that PDBREP outperforms related lazy replication techniques

Repository for Publications and Research Data

CiteSeerX

Tree Awareness for Relational DBMS Kernels: Staircase Join

Author: Blanken Henk
Grabs Torsten
Grust Torsten
Schek Hans-Jörg
Schenkel Ralf
van Keulen Maurice
Weikum Gerhard
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2003
Field of study

Relational database management systems (RDBMSs) derive much of their efficiency from the versatility of their core data structure: tables of tuples. Such tables are simple enough to allow for an efficient representation on all levels of the memory hierarchy, yet sufficiently generic to host a wide range of data types. If one can devise mappings from a data type tau to tables and from operations on tau to relational queries, an RDBMS may be a premier implementation alternative. Temporal intervals, complex nested objects, and spatial data are sample instances for such types